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Abstract 

We derive a Ewald decomposition for the Stokeslet in planar periodicity and a novel 
PME-type 0(A^log A^) method for the fast evaluation of the resulting sums. The decom- 
position is the natural 2P counterpart to the classical 3P decomposition by Hasimoto, 
and is given in an explicit form not found in the literature. Truncation error estimates 
are provided to aid in selecting parameters. The fast, PME-type, method appears to 
be the first fast method for computing Stokeslet Ewald sums in planar periodicity, and 
has three attractive properties: it is spectrally accurate; it uses the minimal amount of 
memory that a gridded Ewald method can use; and provides clarity regarding numerical 
errors and how to choose parameters. Analytical and numerical results are give to support 
this. We explore the practicalities of the proposed method, and survey the computational 
issues involved in applying it to 2-periodic boundary integral Stokes problems. 

1 Introduction 

Viscous flow systems continuously enjoy much attention from various scientiflc disciplines, 
such as the widespread study of passive and motile suspensions and other applications in 
bio-, micro- and complex fluidics. An example is the large body of work that involves the 
study of locomotion of small organisms, the basics of which are illustrated in a classic article by 
Purcell [55] , with further developments in e.g. Koiller et al. [37] . Theoretical understanding of 
various modes of propulsion [66, 65, 17, 8, 30, 29] is rapidly progressing apace with simulation 
methods [32, 59]. If history is any guide, analytical results will feed into computational 
models and rich computational studies of complex systems will emerge. There are also notably 
strong interdisciplinary connections present in this area. A recent survey of the modelling of 
biomimetic fluid flow by Saha et al. [57] provides a broader perspective, including modeling 
and computation outside of the viscous flow regime. 

Another area where there has been much activity is in simulation of suspensions of various 
particles in viscous flows, motivated, for instance, by a desire to understand the formation 
of microstructures [27] and paper-making [46]. Here one flnds detailed computational studies 
of rigid or flexible fiber suspension, such as the work by Saintillan et al. [58] and Tornberg, 
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Shelley and collaborators [69, 68, 61], as well as large body of work concerning suspensions 
of various spheroids [7, 64] and related particles [41]. Complex structure formation is ob- 
served, based on accurate modeling of hydro dynamical interactions, in these computational 
investigations. 

Alas, the present work shall not in itself elucidate these fascinating questions of collective 
dynamics, but rather pursue algorithmic development that could help such investigations deal 
with larger systems in future. 

The models under consideration in the works cited, and in countless other investigations, 
are based on the Stokes equations. A variety of numerical and analytical techniques are 
used, but a common feature is the use of singularity solutions. One class of methods is the 
so called distributed singularity approach, in which the various Green's functions of Stokes 
equations are combined in point, line, or surface distributions to generate the flow induced by 
a particular particle or distribution. In this way, slender bodies (e.g. fibers) are represented 
by a line distribution of Stokes potentials along its centerline, as was suggested by Batchelor 
[3]. In an analogous way, spheroids can be represented by a combination of higher-order 
Green's functions, see e.g. Zhou & Pozrikidis [73] and Gotz [18]. 

There are also direct boundary integral methods, as discussed in e.g. Pozrikids [53, 51], 
Anderson et al. [31, 5, 4], and others [35, 71, 72]. Another class of methods that has been 
highly successful is known as Stokesian dynamics, and is due to Brady and collaborators 
[6, 64] . All of these methods are based on distributions of Green's functions that are either 
summed or integrated, often in intricate ways. 

Moreover, results free from finite-size effects are often of interest; the well-trodden path 
towards which is applying periodic boundary conditions to a sufficiently large system [64, 69, 
58]. Solvers that exploit periodic structure, such as the fast Ewald methods that we shall 
survey shortly, are often highly specialized. In particular, there are fundamental differences 
between how full (i.e. applied in all three directions) and lower dimensional periodicity (i.e. 
periodicity with respect to one or two dimensions) enters in mathematical (cf. Pozrikidis 
[52]) and algorithmic terms. Whereas methods are relatively well established for fully periodic 
Stokes problems [64, 58, 42] , methods for problems in planar periodicity, e.g. when periodicity 
is applied to {x,y) and z is "free", are less so (see e.g. [52, 28]). However, such systems, 
including wall-confined systems, applications in biology (such as beating flagella in planar 
configurations [9, 36, 38]), are of growing interest. 

The present work deals with the efficient, 0(A^log A^), and spectrally accurate summation 
of Stokes potentials in the 2P setting. To clarify, let J7 = [0, L)^ and consider Stokes equations 
for X G rj, 

-Vp(x) + ^Au(x) = f (x) 

V • u(x) = 0, ^ ' 

where u(x) denotes the velocity field, p(x) the pressure, and f(x) the force applied to the 
fluid. The fundamental solution of Stokes flow represents solutions of the singularly forced 
Stokes equations, i.e. (1) with f(x) = fo5(x — xq). 
Introducing the Stokeslet, 
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where = 5ij is the identity and x x is the outer product XiXj, the solution of (1) in 
free space with f(x) = fo(^(x — xq) can be written: 

If 1 
u(x) = - — / ^(x - y)fo(5(y - xo)dy = - — 5'(x - xo)fo. 
07r/_i 57r/i 

See e.g. the textbook by Pozrikidis [51, p. 22] for a derivation of (2). A solution u(x),x G 
that satisfies periodicity is expressed as an infinite sum over periodic images of fo convolved 
with S, the Stokeslet, 

u(x) = ^ / S{^-y)Y^ fo<5(y - x„ + r(p))dy 
'S'(x-x„ + r(p))fo, 



where d = 1,2,3 is the dimension of periodicity (recall, x G M'^ throughout) and r : Z'^ — t- 
is a translation into the periodic image p. Under fully periodic boundary conditions (3P) one 
would let r(p) = Lp, and in the 2-periodic (2P) case we let t(p) = [Lp, 0]. 

In light of the applications surveyed above, we let f denote an array of point forces, 

N 

f(x) = 5]f„5(x-x„), (3) 

n=l 

where f„ may contain any particular set of e.g. quadrature weights and physical constants. 
We focus on the evaluation the periodized Stokeslet sum 

N 

= E E ^(^ - ^« + ^(p))f«' ■ (4) 



yd n=l 



Note that the terms in (4) decay as 1/r, and (4) is obviously not absolutely summable. 
Takemoto et al. [67] discuss this in the related context of the electrostatic potential. However, 
the practical computation of (4) is plainly infeasible, even for small N, so fast methods are 
essential. Alternatives exist at this point, including pursuing extensions to the fast multipole 
method (FMM) by Greengard and Rokhlin [22]. There exist FMMs that incorporate peri- 
odicity in 3D, e.g. Kudin & Scuseria [40] for electrostatics, and FMM-related methods for 
Stokes [19] in 2D with periodicity. However, the dominating framework for periodic problems 
in this setting incorporates periodicity by using Fourier transforms. The basic principles for 
how to proceed in that direction have long been clear, neatly divided into two stages: 

(a) Decompose the Stokeslet sum (4) into rapidly converging parts, e.g. by applying ideas 
from Ewald summation. 

(b) Devise a method to reduce the complexity of computing the decomposed Stokeslet sums, 
e.g. by means of FFT-based methods. 

The extent to which this has been realized depends on the periodic structure of the 
problem. In the fully periodic (3P) setting, various decompositions exist that clarify (a), and 
methods with 0(A^log A^) complexity have been developed (6), as we survey in Section 2. 
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In the case of planar periodicity (2P), the picture is much less clear. Existing decompo- 
sitions pertaining to (a) are surveyed in Section 3, after which we derive an Ewald-type sum 
for computing (4) (Section 3.2). A fast, 0{N log N), method is presented in Section 4, which, 
as far as we know, is the first reported method that deals with (6) in the 2-periodic setting. 

Interestingly, the fast method we propose (6) is, in some ways, simpler than the underlying 
Ewald sum (a) that we derive. As shall become clear, this has to do with a relationship 
between the 2- and 3-periodic settings, which we exploit for the fast method. 

We demonstrate several appealing characteristics of the proposed method: spectral accu- 
racy; efficiency in both run-time and memory; clear error estimation and parameter selection; 
and a close and revealing correspondence to methods for the 3P problem. This will, we hope, 
facilitate future computational investigations of micro- and complex flow systems, enabling 
large systems to be simulated accurately. 



2 Stokeslet Ewald sum in full periodicity 

We start by summarizing well-established results for the fully periodic case. Before interest 
in solving Stokes equations took off there was already a large body of work established on a 
related problem in electrostatics ~ summing Coulomb potentials under periodicity (solving a 
Poisson problem, rather than Stokes) - known as Ewald summation. The basic principle is 
that the potential, (p ^ 1/r, is split into a short range part that is exponentially decaying, 
and a long range part that is very smooth (and thus exponentially convergent in reciprocal 
space) . 

Pioneering work by Hasimoto [24] showed that a 3-periodic vector field u^'^(x) can be 
computed in a Ewald-like manner. 



N 

peZ3 n=l 



+ I? E Bit k)e-'=V^«^ f: f^e-^-^-^— ) - i^f„, (5) 

^ k^O n=l 



where 



/vr 



with r := ||x||, and 

S(^,k)= (^l + ^j^(fe2l-k®k). (7) 

The parameter ^ > 0, which u^^ is independent of, is known as the Ewald parameter and 
controls the rate at which the two sums converge relative to each other. Note here that the 
two sums in (5) have the desired structure - the first is a sum in real space that converges 
roughly as e~^^^^ , and the second is a sum in k-space that converges as roughly as e~^'^^^^^ . 
Other decompositions exist, see Pozrikidis [52, Sec. 3.1, 4.1]. 
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2.1 Fast methods in full periodicity 

The ~ 1/r convergence of the original Stokeslet sum (4) is vastly improved by the Ewald 
method (5). However, the complexity of computing the Ewald sum (5) is still severely limiting. 
There are two reasons for this: First, summing (5) for all has 0{N'^) complexity. Secondly, 
the constant hidden in the formal complexity can be very large; in fact, it grows cubically 
with higher accuracy. 

Faster methods for the corresponding Poisson problem in electrostatics and molecular 
simulation have been around for three decades, following work by Hockney & Eastwood [26]; 
see the survey by Deserno & Holm [12], or recent work by the present authors [44]. 

Such methods have been adapted for the 3P Stokes Ewald sum (5), staring with Sierou & 
Brady [64] (embedded in the framework of "Stokesian dynamics"). Their method is based on 
the Particle Mesh Ewald (PME) method by Darden et al. [11]. Saintillan et al. [58], in their 
method for sedimenting fibers in Stokes flow, base a fast method for the Stokeslet sum on a 
refinement of the PME method, known as Smooth Particle Mesh Ewald (SPME) [14]. The 
SPME method is known to be more accurate than its predecessor, though still of polynomial 
order. Recognizing that the exponentially fast convergence of the underlying Ewald sums is 
lost when a traditional PME approach is used, the present authors presented a spectrally 
accurate PMEl-type method for Stokes in [42]. 

3 2-periodic Stokeslet Ewald sum 

Perhaps counter to intuition, one should not expect a method for 2-periodic systems to 
follow by elementary manipulations of the 3P decomposition (5) - to the contrary, one is 
best advised to start anew from the Stokeslet sum (4). It is also instructive to review the 
corresponding transition from 3P to 2P in the electrostatics setting, as we do in [43]. One 
finds that consolidation, on the level of agreeing on a preferred decomposition, has yet to 
happen (cf. e.g. the 2P Ewald sum in Gryzbowski et al. [23] and a survey of non-Ewald 
methods by Mazars [50]), and that fast PME-type methods are considerably less mature than 
their 3P counterparts, though we hope that [43] contributes in this direction. 

There does exist 2P Ewald decompositions for Stokes - notably in Pozrikidis [52] , relating 
to earlier work by Ishii [28]. While it should be emphasized that [52] is among the most 
valuable references in the present context, we find the results given therein lacking in two 
respects: First, it turns out to be quite straight-forward to derive and present a 2P Stokes 
Ewald sum in explicit form, whereas Pozrikidis [52, Sec. 2.2] is content with giving a "gen- 
erating function" and a differential operator. Secondly, and more importantly, the results on 
offer in [52] do not, at least to us, suggest how a fast PME-type method could be developed. 

Such a fast method being our objective, we now set out to derive a 2P Ewald sum for 
Stokes in a suitable way. To give an overview, we start by deriving a pure k-space solution 
to a 2-periodic Stokes problem that decays as z — t- ±00, which we then decompose using the 
"screening function" (15) proposed by Hernandez-Ortiz et al. [25] to generate the Hasimoto 
decomposition (5)-(7) in the 3-periodic setting. We then consider the family of solutions 
obtained by adding a piecewise linear function, showing that this admits an end-result which 
is differentiable. Under certain conditions, we show that these smooth solutions have finite 
limits as z — ^ ±00, as expected physically. The resolution of the smoothness/singularity issue 
is quite compactly discussed in related work [52, pp. 83-84], [28, p. 676]; this may suit some 
readers more than others. 
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Figure 1: Planar periodicity (2P): Primary cell replicated infinitely in the plane. 

3.1 2P Stokeslet Ewald derivation: preliminaries 

3.1.1 A mixed Fourier transform 

The conventions for Fourier transform pairs used in the present work are given in Appendix 
B. As in [43], we start with observing that the periodic structure of a 2-periodic function /i(x) 
implies that its spectral representation will be mixed in the following sense: Let /i(x) = h{p, z) 
be periodic in the (x, y)-plane and "free" in z, i.e. p := (x, y) £n = [0, Lf and z £R. Then 
a Fourier representation of h is 

1 fOO 

Kp,z) = -Y. /i(k,Ac)e^^-V-dK, (8) 
2vr ^ J-oo 

where k G 2-^1? /L and k G M. We shall assume that h{p, z — t- ±00) decays faster than any 
inverse power of z. In this setting, (8) exists. Also under the present assumptions, 2P versions 
of several familiar results from spectral analysis hold, such as the Poisson summation formula, 
Parseval/Plancherel's theorems and the convolution theorem, see [43]. 

3.1.2 Solution of a 2-periodic Stokes problem 

Before we address Ewald summation for the 2P Stokes problem, we need to establish (slowly 
converging) solutions of the original problem that, crucially, vanishes as z — t- itcxD. Consider 
the Stokes equations, 

-Vp(x) + /xAu(x) + f (x) = 
V • u(x) = 

under 2-periodic boundary conditions with respect to Presently, assume that the source 
term, f, has a mixed Fourier transform, f(k, k), and expand u and q := Vp likewise, 

u(x) = u(p, z) = ^Y.I "(k, ^V'-'e'^'dK (9) 

k 

Vp(x) = Vp{p,z) = ^Y.I q(k,K)e^''V^MK. 
27r ^ Jk 
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Inserting into Stokes equations, with k := (k, k) G {27rZ^/L} x R, 

q + /.||kfu = f 
zk • u = 0. 

As u should vanish at infinity we may omit summing over k = 0, because u(0, k) = is 
impHed. Ehminating the pressure gradient above, q(k, k) = (k (g) k)f(k, K)/||k|p (since q is a 
gradient q(k) is parahel to k), gives 



u 



ufk, k) 



1 



/X \||k| 



Hence, in Hght of (9), 

u{p, z) 



27r/u 



E 



f(k,K), k/0. 



f (k, Ky^-Pe'^^'dn 



(10) 



(11) 



Recalling f(x) from (3), we absorb a factor Svr/i into the coefficients f„ for convenience and 
conformity with convention. With that, (11) becomes 



u(/9, Z) 



L2 



E 



N 



n=l 



where the integrals are computable. We obtain (see Appendix A.l): 

4 ^ 

= Z2 E E ^(k,^ - ^„)fne*'^-(^-''"\ 



(12) 



k^On=l 



where 



Q{k,z) 



llkllkl 



2||k|| 
(7r||k|||z|+7r)fc| 



hiiTzki 



(7r||kH|2|+7r)fc^ 7r(||kj||2| + l)fcifc2 

_7r(||k|||2| + l)fclfe2 

2pc]p 2||ki|2 

— ^iTTzki —^iTTzk2 ^{7r\\]i.\\\z\ + ir)^ 



\iTTzk2 



(13) 



Note that u is well-defined, but, as will be discussed in Section 3.2.2, fails to be differentiable 
in an (x,y)-plane around each z„. 



3.2 2P Stokeslet Ewald derivation 

A fruitful and flexible approach to deriving Ewald sums is to compute convolutions of source 
term with a so called screening function - this is the classical approach for the 3P Laplace 
(electrostatics) problem. In planar periodicity the singularities encountered are more severe, 
and the appropriate behavior in the limit z — >■ ±00 not as straight forward as the correspond- 
ing "tin foil" (or, for Stokes, "pressure gradient counters net force") conditions that enter 3P 
derivations. Our approach here is to construct a decomposition of (12) and then address the 
regularity issue. 

By "screening function" we mean any normalized function, ||7(x)||2 = 1, that decays 
smoothly away from zero, such that a decomposition, 

f(x) = {{5 - 7) * f)(x) + (7 * f)(x) = f(x) - f^(x) + f^(x), 
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where (5(x) denotes the Dkac measure on and f'y(x) := (f * 7)(x), is well defined. With 
this, clearly, 

u(x) = ^ / 5(x - y) (f (y) - f^(y)) dy + / 5(x - y)f^(y)dy. (14) 

As noted, the standard derivation for the 3P Laplace case is to take 7 as a pure Gaussian, 
compute the first convolution directly and treat the second term in reciprocal space. In- 
terestingly, in a short paper sparse on details [25], Hernandez-Ortiz et al. give a screening 
function, 

7(0 = ^e-«^^^ - CV^) , (15) 
that generates exactly the 3P Hasimoto decomposition (5)-(7). 
3.2.1 The real-space sum 

The first term in (14) can be computed directly, as is standard in this context (though the 
calculations are laborious), and one finds that 

N 

/ 5(x - y) (f (y) - f^(y)) dy = ^ ^ A{t x - x„ + Lp)f„, (16) 

■^^'^ n=l peZ2 

where A is the same (though summed over a two-dimensional lattice this time) vis-a-vis the 
3P decomposition (6). 

Note that a particle does not itself contribute to the potential field, or flow, it experiences, 
so it is natural to simply drop the term in (16) that corresponds to p = when m = n. 
However, by the decomposition (14), part of the contribution that we're trying to remove has 
gone into the second term. By computing the difference between the free space Stokeslet and 
the real-space term, we can find this contribution. This is to be subtracted off, so we do it 
with the opposite sign: 

hm (^(x,e)-5(x))= hm L(a + erf(e||x||))^ + (a-erf(e||x||))^) =-iil, 

||x||^0 l|x||^0 V ||x|| ||x||^ / \/TT 

where a := 2^7r~-^/^||x||e^''^ll'''l^. By convention, this term is denoted "self interaction" which 
can lead to some confusion, since it's the term removing self interaction that is included in 
the k-space sum. Nonetheless, we have arrived at two parts of the 2P Stokeslet Ewald sum, 

N * 

u«(x^) := E E ^(^' - x„ + Lp% (17) 

Usclf(Xm) := l=fm- (18) 

\/tt 



The notational inconvenience *, signifying the omission of the term {p = 0,n = m} in (17), 
is standard. 
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3.2.2 The k-space sum 

Turning to the second term in the Stokeslet decomposition (14), first note that both terms 
in (14) are solutions to Stokes equations. Moreover, both £y and f — £y are admissible under 
the assumptions of Section 3.1.2. The Fourier transform of 7 over is 

In light of (11) we let £y = f * 7 generate a function u'^ that satisfies — >■ as z — >■ ±00, 

"V.)^5i^gX(^-!^)MMK^-v-M„ 



4 

1? 



4 

1? 



(19) 

k^O"^^ n=l 



where we have used Poisson summation and grouped similar Fourier coefficients. 

For future reference, we note that this expression is the starting point from which we 
develop a fast PME-type method. Given the present mixed periodic setting, the close corre- 
spondence between (19) and (7) is entirely expected, and illuminating per se. 

As it transpires in Appendix A. 2, the integrals above, 

can be computed. For clarity of notation is is useful to let 

Q(k,z) = Qi(k,z) + Q'^«'^(k,z), 

where the meaning of the superscripts is implied from the terms in the first factor under the 
integral in (20). With this, and the computations in Appendix A. 2, it follows that we may 
write u'^ as a sum, 

4 ^ / 

vf{pm, ^m) = 72 E E Zn.n) + Re Q"^®"^(k, Z„„)) COs(k • Pmn)- 

+ Im Q»^®»^(k, z^n) sin(k • p^„)) f„, (21) 

where 

Qi(k,z) = 2(':|M + ji(||k||,.)^I (22) 
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and 



Q^^^{k, z) 



(23) 



The terms J^" and K'^ are short-hand for the various scalar integrals that can be identified by 
studying the integrand in Q. With 



A 



fc2/4g2_|2^2 
k 



e 

e'^^'erfc 



e ''"'erfc 



2e 



2^ 



-) 



we can write down the computed integrals as 

4{z,k) 
4iz,k) 

4{z,k) 

4{z,k) 

4{z,k) 

K\{z,k) 
Kl{z,k) 



/vrAC 
vr (6'_ + 
Ak 



+ 



9- + 



8k^ 



k + ^/7^\C 



vr 



16^2 



k 9- +9+ 1 

- H + - 

8k 8 



l6kC^ 



4^ 



vr 



16e 



+ 



+ 



(24) 
(25) 

(26) 

(27) 
(28) 

(29) 

(30) 

(31) 

(32) 

(33) 



Despite some difficulty of notation, the sum (21) is straight-forward to evaluate. 

Now, recall that we are presently aiming for a decomposition of the pure k-space solution 
(12), wherein the k = term was dropped to ensure decay as 2; — >■ ±00. However, by the 
decomposition (14), part of the k = mode has gone into the real-space sum (17) and must 
be subtracted off. Computing (12) explicitly has provided a k-space view of the real-space 
sum, so that the term to be removed can be extracted as 



4 / ^ 

n*{z) = liin (u(k) - u^(k)) = lim ^.iQi^ 

\n=l 

Computing the desired limit, one finds 

lim (Q{k,z)-Q(k,. 

k— >0 V 



Zn) - Q{k,Z- Zn))fn 



a{z) 
a{z) 
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where 



a{z) := -ir\z\ — 7rzerf(2;^) 



e ^ a/vt 



2^ 



(34) 



so that 



4 

u*(z) = 2_, - ^n)l2fn, I2 : = 

ra=l 



1 
1 




We now have the desired decomposition of (12), 

u(Pm, Zm) = U^(pm, Zm) + U^{Pm, Zm) - U* (Zm) + Ugelf- 

Naturally, the non-smoothness in (12) is present here (in the term u*). However, we can 
recover a smooth solution by relaxing the restriction that u— T-Oasz— t-iLcxd. To any u 
that satisfies the Stokes equation (1) we can add a linear function. By that token, and in 
close analogy to how the 2P Ewald sum for electrostatics was obtained in [43] , we subtract a 
piecewise linear function. 



u 



F,k=0 



(^m) 



4tt 

1? 

4 

1? 



N 

El 

n=l 



Zn\i-2^n - U* 



E ( ^(^m - Zn)eii{{Zm - Zn)C) + 
n=l V 



2^n- 



With this, the derivation of the 2P Stokes Ewald sum is complete. We have that 

u(Xm) = \l{pm, Zm) = U^iPm, Zm) + U^{Pm, Zm) + U^'^=^ (Zm) + Usdf, 



(35) 



(36) 



from (17), (21), (35) and (18). These are nice and exponentially convergent sums, but, 
as discussed in the introduction, the complexity of evaluating them for a large system is 
debilitating. Note that u^''*-^'' contains terms that look like an unbounded shear flow in z. 
However, if the coefficients sum to zero in the first two components, u^''^^'' tends to finite 
limits as z — 7- ±00. As in the electrostatics case, one can show that 



lim u(p, z) - lim u(p, z) ] = ^ 



TV 



N 



J2zJnh, if ^(f„),=0,i = l,2. 



n=l 



n=l 



That is, in the planar directions (x, y) the flow sees a transition (across z G (0, L) continuously) 
proportional to the dipole moment in z. 

3.3 Truncation error estimates for 2P Stokeslet Ewald sums 

It's noteworthy that the usefulness of Ewald sums (cf. Section 4.4, on parameter selection) 
rests on the availability of truncation error estimates. Therefore, we need to endow the 2P 
Stokeslet Ewald sums, (17) and (21), with such estimates. 
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To this end, let the real-space sum (17) be truncated such that only interactions between 
particles (including periodic images) within a distance rc from each other are included. That 
is, let 

N * 

^rS^m) := XI XI lr,(||xm-x„ + r(p)||)^(^,Xm-x„ + r(p))fn, 

n=lpeZ2 

where lrc('^) denotes the indicator function which is one if r < rc and zero otherwise. Esti- 
mating ||u'^ — u^ll is most tractable in the RMS norm. 



N 

'^(u(Xn) - U*(x„))2. 



N ^ 

n=l 



(37) 



In the electrostatic case, Kolafa & Perram [39] have derived famous truncation error estimates 
for randomly scattered particles in RMS norm. However, the diagonal terms in A are Ajj = 
erfc(^r)/r — 2^exp(— ^^r-^)/-y/7r, whereas in the Laplace case only the former (and smaller) 
term is present. On the other hand, if we disregard the off-diagonal terms in A, the key step 
in [39, Appendix A, Eq. (14)] becomes tractable for the Stokeslet, and we get estimates for 
components j = 1,2,3, 



-Tins, J J 




A^^^r^ sin(6l)drd6ld</. 



/27r 



^i'ri _ 7rerfc(^rc)2 + ^eMV2^r, 



(38) 



where Qj := Z]n=i(/i)n- what follows, we suppress the vector index j. 

The estimate (38) is to be treated as such - it does not include the full operator and it 
is statistical, the latter meaning that it's only valid if Xm is randomly distributed. None the 
less, (38) is very predictive within its domain, as we illustrate in Figure 2. Here we have 
N = 10000, Xm randomly from a uniform distribution over 17 — [0, 1)"^, fm from a uniform 
distribution centered at zero and the RMS average was formed over 30 random x^. Finally, 
for selecting parameters one would ideally like to solve e^g(^) = e for ^, but this does not 
appear tractable. Rather, we series expand the error estimate for large ^rc, obtaining that 



(39) 




and consequently, assuming that Tc > L e /(4(5), 



\ 




The agreement between computed errors and the simplified estimate (39) is illustrated in 
Figure 2. 

For the k-space sum (21), truncated as ||k|| < 2T:koo/L^ it's harder to follow Kolafa &: 
Perram [39] and derive a corresponding estimate. Rather than to ignore the issue altogether. 
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Figure 2: Real-space sum truncation error (37) and estimate (38) of j;-component, in RMS- 
norm, as a function of truncation radius. N = 10000, $7 = [0, 1)'^, ^ = 4, 8, 12 (top to bottom). 
As dashed, simplified estimate (39), hardly distinguishable. 

which would break the way parameters are selected (cf. Section 4.4), we take a heuristic 
approach. Based on existing results in the RMS setting, it's natural to suppose that an 

estimate of the form e^-^^ ~ C^Qk^S^e ^ «^ , for particular a, 6, can be of value. A few trial 
runs quite clearly suggest that good agreement is obtained if a = —3/2 and 6 = 3. Moreover, 
we let C = L^TT"'*, obtaining a heuristic, or practical, error estimate for the truncation error 
of the k-space sum (21), 



In Figure 3 we give results that demonstrate that this estimate captures the truncation error 
well for a range of parameters. Here, ^ = 4, 8, 12 and L = 1, 3, = 400, 200 and the RMS- 
average is formed over a random set of 30 points^. We observe satisfying agreement, and, as 
appropriate, that the estimate is somewhat conservative. 

Pertaining to both the real- and k-space sums, the RMS measure will underestimate errors 
if particles are not randomly scattered. In that case, it becomes appropriate to consider oo- 
norm estimates. The essential property, that the sums converge at least as fast as exp(— r^^^) 
and exp(— (7rA;oo/(C-^^))^) respectively, still holds, though obtaining reliable estimates with 
detailed constants, as in the RMS case, is bound to pose a challenge. 

4 Fast methods for 2P Stokes Ewald sums 

In Section 2.1 we briefly surveyed fast, ©(AlogA), methods for the fully periodic (3P) 
problem, and we shall adhere to same framework under 2P: 

a) The real- space sum (17) can be made arbitrarily cheap to compute by choosing the 
Ewald parameter, ^, sufficiently large (cf. Section 4.1); 

^The small systems used to evaluate the estimate (40) are small - limited by the computational cost of 
evaluating (21) for large ^. 




■oo 



(40) 
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Figure 3: Truncation error of k-space 2P Stokeslet Ewald sum (21) in RMS norm (37) vs. 
number of modes koo, i.e. ||k|| < 2TTk^/L. Left: L = 1,N = 400. Right: L = 3, = 200. In 
both panels, ^ = 4, 8, 12 (left to right), and (solid line) heuristic error estimate (40). 

b) by using fast Fourier transform (FFT), the (non-singular) k-space sum (21) can be 
evaluated in linear time (independently of ^); 

c) the (2P-only) singularity contribution (35) is easily dealt with as a one-dimensional 
interpolation problem. 

It is really item (6) that is the heart of the matter, that and the glue that hold the parts 
together: parameter estimation. In the 3P setting one starts from the Ewald sum (the second 
term in (5), in the Stokes case), and the steps to obtain a fast, FFT- based, method are quite 
intuitive. That is not to say that the matter is simple; a rather large debt of insight is owed 
to the early pioneers in the field, such as Hockney & Eastwood [26] and Darden et al. [11]. 

Attempts to do the same thing in planar periodicity immediately run aground on the alge- 
braic structure of the corresponding k-space Ewald sum (21) (cf. e.g. Grzybowski et al. [23] 
for Laplace). Looking at the k-space 2P Stokes Ewald sum, and its constituent expressions, 
(21)-(33), it is not hard appreciate the challenge in finding transforms and approximations 
that mimic the 3P approach. In the electrostatics setting, a variety of non-Ewald (cf. Mazars 
[50]) and non-2P Ewald methods (cf. Arnold et al. [2] and the references therein), have 
been developed instead. These methods have not been adapted to Stokes (i.e. deriving cor- 
responding Lekner sums [50] or correction terms [2]), and it is quite evident that, at best, a 
substantial effort would be required to do so. 

What we propose, in contrast, starts from the mixed sum/integral (19). Formulating a 
FFT-based (PME) method becomes a quite straight-forward matter. It follows the same line 
of reasoning that we set forth in [43]. First, however, we give a few remarks regarding item 
(a). 
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4.1 Real-space summation in linear time 

The truncated real-space sum 

N * 

UrMm) = Yl lre(l|Xm -X„ + r(p)||)^(^,Xm -X„ + Lp)f„, 

n=l peZ2 

with 

is, in order to benefit from much related work, most conveniently written as 

uf^(x„)= ^ ^(e,x^-y)f(y), (41) 
yeNL™ 

where NL^ = NLmC^^c) denotes the set of near neighbors to x^ (counting periodic images, 
if necessary). The point being that if |NLm| is constant, for all m, as A^, the number of 
sources, grows, then evaluating (41) for all x^ has complexity 0{N) instead of 0{N^). This 
presupposes that all neighbor lists NL^ can be found in linear time, which is the case. In 
fact, it is quite elementary and we refer the reader to the textbook by Frenkel & Smit [15, 
Appendix F], and the recent paper [44] by the present authors where additional details are 
given. 

Note that each neighbor list, NL^, can be viewed as the non-zero pattern for a row in a 
sparse matrix. In light of this, it's natural to suggest that the evaluation of (41) for all x^ 
be implemented as a sparse matrix-vector product, u-^ ^ ^[(rc, {xm})vec(f). The 3A^ x 3A^ 
matrix A has a 3 x 3 block structure corresponding to Cartesian components Aij, but each 
block has the same sparsity pattern. If (41) is to be evaluated for many f, as is the case in 
the context of iterative solution of boundary integral equations (cf. Section 5.1), the matrix 
form saves a very large amount of redundant arithmetic [58]. In a setting where locations 
change, as in fiber simulations [68] , the matrix elements of A have to be recomputed, but the 
sparsity pattern can be valid for several time steps^. 

4.2 SE2P Stokes: Fast k-space method 

The fast method that we give here is based on earlier work by the present authors [42, 44, 43], 
and this exposition is somewhat condensed and, in contrast to our previous work, draws in 
its structure and notation from the elegant treatment of the 3P electrostatics case by Shan 
et al. [60]. 

Instead of starting from the 2P k-space Stokes Ewald (21), we invoke the mixed sum- 
integral form (19) 

■^Typically, the neighbor hsts NL^ are constructed with some margin, ||x — y|| < rc + r' , to allow particles 
to move a distance r' before the neighbor list has to be recomputed. 
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It is fairly easy [43] to show that u^(xm) can be obtained as a convolution of a particular 
smooth function, : i7 x M — >■ M'^, with a suitably scaled Gaussian centered in x^, 

u^(/9„, zm) = C' f f i^{p, z)e~^\\p-P^\\'e-^\'-'^\''dpdz, (42) 

where 



and 7/ > is a parameter. To define ^, and show how it enters, we recall the decomposition 
(14) and note that we can view i?(lk)e~l'''^ll as the k-space representation of the "regu- 
larized" Stokeslet (5 * 7)(x), as discussed at the outset of Section 3.2. In Section 3.2.2 we 
proceeded to obtain the Ewald sum (21) by convolving 5*7 with f (x) and laboring over the 
resulting integrals (the convolution itself was trivial though). Mesh-based Ewald methods de- 
viate at this point. The logic is to consider not f but a regularization, fr;(x) = {G{r]) * f)(x), 
and adjust the Green's function accordingly. That is, choose a kernel G and determine a 
modified Green's function, B, such that 

u^ = {S* 7(0 * f)(x) = (f * G{ri) *B* G(r/))(x). (43) 
It turns out to be advantageous [44] to let G = C'e-^ll^ll", so that, 

N 

f^(p, z) = G'Y, e-'^ll''-^'""*e-^l^-^'"l'f„, (44) 

n=l 

by which it modified Green's function, B = e~^^^'^''"l''*'il^/^^^i?(k, k), follows. The convolution 
f,y * is conveniently computed as a multiplication in frequency domain, 

V^(k, k) - |g_(i_^)(fc2+^2)/4^2^^j^^ ^^^^j^^ Otherwise ' ^^^^ 

The last convolution in (43), {ip*G{r])){-K), brings us back to (42). In [42, 44, 43] detailed and 
constructive derivations are given instead of this sketch. To clarify, we give a few remarks, 
starting with a summary of the method: 

Summary of method 

To compute the (non-singular) k-space contribution u-^(xm) for all m the steps are as 
follows: (i) evaluate fj] (44) on a uniform grid over 0,; (ii) compute a mixed Fourier 
transform to arrive at f^^; {iii) multiply with modified Green's function according to (45); 
(iv) an inverse mixed transform yields ip on a. regular grid, so that; (v) the convolution 
(42) can be computed for all x^- In steps (i) and {v), the Gaussians are truncated to 
have support on grid points, for some P determined by the acceptable approximation 
error. 

Fast transforms; uniform grids PME-methods become efficient by requiring that the reg- 
ularized charge distribution f,y(x) be evaluated on a uniform grid. The transforms are 
then handled by FFTs. 
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Mixed transforms Recall the spectral representation (8) of 2P functions in terms of discrete 
Fourier series in (x, y) and a continuous transform in z. Hence, tlie transforms — y 
and 'ip ^ if) are mixed. This, together with the exclusion of k = in (45) and the 
inclusion of the singularity contribution (35), is where our proposed 2P method deviates 
from the well-established 3P methods. 

Fourier integral quadrature Computing the mixed transforms requires two standard dis- 
crete Fourier transforms and an approximation of the Fourier integral (in the z-direction) . 
The quadrature step has to be done in the same time-complexity as the discrete trans- 
forms; otherwise, the method would not be more efficient than directly summing (21). 
This is discussed at length in [43], and it is shown to be quite satisfactory to use an 
FFT-based method (following Press et al. [54, Sec. 13. 9]) on a moderately oversampled 
grid. 

Spectral accuracy Trapezoidal quadrature applied to (42) is spectrally accurate, due to the 
-regularity of the integrand. Detailed analysis on this appears in [44]. Moreover, 
the accuracy in computing the Fourier integral in the mixed transform using a simple 
FFT-based approach strongly depends on the regularity of the integrand. 

Grid size corresponds to direct sum truncation As is also extensively discussed in [44] , 
the proposed method enjoys a close relationship between the Ewald sum (21) and the 
fast method of this section. A finite truncation |k| < ^Txk^jL of the Ewald sum (21) 
corresponds to a grid of size M = 2koQ -|- 1. Due to the construction of the PME-method 
based on Gaussians (with C°°-regularity) , the truncation error from the Ewald sum car- 
ries over to the fast method, meaning that the appropriate grid size can be estimated 
using truncation error estimates for the Ewald sum. This is, as of yet, conjecture in the 
Stokes 2P setting, verified numerically (cf. Figure 5, right). 

Parameters and truncation The free parameter, r/, can be used to control the width, 
denoted of the Gaussian used for the convolutions in (44), (42), and we find it natural 
to prescribe this width in terms of a number of grid points, w = LP/{2M) = hP/2. 
Gaussians lack compact support, but they are highly localized. It is natural to truncate 
them, as is done in the non-uniform EFT [21, 13]. We let P < M denote the number of 
grid points within the support of each Gaussian, as seen in Figure 4 (bottom). Moreover, 
it's important to have control of the shape, parameterized with m > 0, of the Gaussians 
(independently of ^), see Figure 4 (top). It then follows naturally [44] to let 



Periodicity Above, || • H* denotes that periodicity is implied. To prove that (42) equals 
(19) one needs this notion to be exact (which rules out the well-known "closest image 
convention"). Formally, as Gaussians lack compact support, it should be understood 
e.g. that e~'^ll^~''™'ll* := I]pez2 s"^"''"^™''''^^*'^"^- After truncating the Gaussians, this 
ceases to be relevant - simply extend the domain to accommodate the support of the 
truncated Gaussians; evaluate (44) on this domain, taking Euclidean distance between 
particles and grid points; and then additively wrap the extended domain into the original 
one. 




(46) 
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Figure 4: Top: Gaussians with different sliape parameters, mi < m2 < m^. Bottom: Gaus- 
sian witli support on P grid points around Xj 

Approximation errors By approximation errors we mean tlie errors tliat tlie fast SE2P 
method contributes (in addition to the spectrum truncation errors that are inherited 
from a finite truncation of (21)). These fah into three categories: (i) the numerical 
error from evaluating the integral (42) with a trapezoidal rule Tp; (ii) the truncation 
of the Gaussians; and, finally, {in) the error in computing the Fourier integral in the 
mixed transforms. Regarding (i) and (ii), we can prove a detailed error estimate [44, 
Thm. 3.1], namely that 



Naturally the first term corresponds to (i) and the second to {ii). It also reveals that if 
m is taken large the first term goes slower to zero, indicating that selecting m ~ x/ttP 
strikes a balance between the terms that is favorable for the total error. The final, 
and 2P specific, error concerns the Fourier integral quadrature, and is, as noted above, 
examined in detail in [43]. 

Fast gridding The main trade-off for a spectral method is that Gaussians have wider sup- 
port than e.g. the Cardinal B-splines (used in the SPME method [14]) have. By using 
the fast Gaussian gridding (FGG) procedure, proposed by Greengard &: Lee for the 
non-uniform FFT [21], we mediate this to a large extent. Again, this is examined in 
[44], and detailed algorithms are given. 




(47) 
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Figure 5: Left: Approximation error in oo-norm for SE2P-Stokes method, and (dashed) error 
estimate (47), as a function of the support of truncated Gaussians, P. Here, = 20, M = 20, 
^ = 4. Right: (*) truncation error for the k-space Ewald sum (21), vs number of modes /^oo 
(i.e. |k| < 2tikoo/L). As (•) error in SE2P-Stokes method vs. grid size, where P selected to 
make approximation errors smah (left). This isolates the "spectrum truncation" error of the 
fast method and shows that grid size in the fast method can be selected based error estimates 
pertaining only to the underlying Ewald sum. 



4.3 The singular term 

The last term to evaluate is the contribution from the singular integrals that were taken out 
of the k-space sum (21), 

4 ^ 

n=l 

ff(z):=7rzerf(ze) + ^e-^'«', (48) 

cf. the Laplace case [43]. Because g only depends on z it's almost trivial to compute u-^'^^'^ 
using an interpolation approach. We advocate a Chebyshev method as follows: (i) evaluate 
yF,k=o^g^^ where s denotes the set of Mq Gauss-points scaled to the relevant interval; (n) 
compute the Mq coefficients of the interpolating Chebyshev polynomial, C,{z)^ by means of 
elementary recursions; {in) evaluate u^'^^^{zm) ~ Ci^m) using a numerically stable Clenshaw 
formula [54, Sec. 5.4]. Properties of Chebyshev interpolation, including spectral accuracy, 
are discussed in e.g. Rivlin [56] and many elementary textbooks. 



4.4 Parameters, estimates and modus operandi 

A commonly cited drawback for Ewald methods, particularly for fast EFT- based ones, is that 
there are several parameters to choose appropriate values for. For 3P Laplace, this is exten- 
sively discussed by Deserno & Holm in their famous survey [12]. Recent results and extensive 
discussion is found in [70]. In the 2P Laplace case, the situation is, unsurprisingly, quite 
misty. Kawata & Nagashima [34] have proposed a method, which shares some foundations 
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with the present work, where it's suggest that an optimization approach be apphed to tune 
a set of eight parameters. The situation is, hopefully, clarified to a large extent in [43]. 
We suggest that the parameters fall into three natural categories: 

1. The desired accuracy, e, in the computed u with respect to a particular norm. 

2. The Ewald decomposition parameter, ^, and the truncation of real- and k-space Ewald 
sums, Tc and k^o, for particular e. 

3. Parameters relating to the fast method: 

a) the grid size, M, (in each dimension); 

b) the support, P, and shape, m, of Gaussians, in the FFT-based method for u^, and 
the oversampling factor s/ in the associated mixed transforms; 

c) the number of Gauss-points, Mq, in the interpolation method for u-^'^^'^. 

Item one, choosing e, comes first - all other parameters will depend on it. We noted that 
the cost of the real- space sum can vary by orders of magnitude depending on it's implemen- 
tation and whether a sparse matrix can be assembled and reused or not. With this in mind, 
we argue that the next consideration should be to select the real-space truncation rc such 
that the real-space is cheap to compute. 

Item two is clarified by the truncation error estimates of Section 3.3. We noted that, in 
terms of e and rc we can take 



That is, ^ is chosen large enough that the error committed by truncating the real-space sum 
at Tc is close to e. Fourth, we invoke the equivalence between k-space truncation and grid in 
the PME-type method. That is, the truncation error estimate (40) is inverted. 



where W{-) is Lambert W function [10], and the grid size selected as M = 2A;oo + 1. 

The remaining parameters, {P,m, Sf}, are all accuracy parameters, pertaining to the 
approximation errors added by the fast method, that don't depend on the system. The one 
with greatest impact on the computational cost is P, the discrete support of the truncated 
Gaussians, and (cf. Figure 5, left) rough estimates are that P = 15 is appropriate for accuracy 
around 10~^^ and P = 23 is required for full (double-precision) accuracy. As previously 
determined, it's natural to let m = CVttP, and we've found that this constant is best taken 
slightly below unity, C = 0.92. The FFT-oversampling in the z-direction, sj, should be at 
least two, but should be four or six for higher accuracies (cf. Figure 5). Finally, the number 
of Gauss-points in the ID interpolation used to compute u^''^^'^ should be small, Mq < 100, 
typically around 50. 

We would like to emphasize that, although there are a number of parameters to select, 
the procedure is eminently straight-forward. In particular, PME-grid size selection based on 
a truncation error estimate for the underlying Ewald sum, is an important feature. 




(49) 




(50) 
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5 Numerical results 



In the introduction several areas of ongoing research involving boundary integral methods for 
Stokes were noted, and we would like to put the present method into that context. This also 
gives the opportunity to illustrate parameter selection and the practical characteristics of our 
method. 



5.1 Fast boundary integral methods in a periodic setting 

As an example, consider a large number of rigid spheres, Fj, i = 1 . . . A'^s, in 2P Stokes flow. Let 
fi denote the (unknown) Stokeslet density on Fj. Under the constraint of planar periodicity 
we write the velocity field generated by all {fj,rj} as 

1 ^= r 

= ^ E E / ^(x - y + T(p))f,(y)dr(y). (51) 

Recall that the periodized Stokeslet sum (4) was given a well-defined meaning by the 2P 
Stokeslet Ewald decomposition (Section 3). The same logic shall apply to (51). We refer the 
reader to e.g. [51, 71] regarding which Green's functions are required to represent particular 
flow cases, noting that for this external flow it suffices to use the Stokeslet. 

The boundary integral setting is vastly expanded in comparison to the exposition hitherto. 
Issues that arise, as concisely reviewed in Ying et al. [71] , include fast summation methods (to 
which this work pertains), accurate quadrature rules (surveyed below) and domain boundary 
representation. 

Let each sphere L' be discretized with a set of quadrature points, x^-, j = 1, . . . , Np, and 
let Tx denote integration with respect to x by a simple quadrature rule, T[C,] := J2j C(xj)if j ~ 
/pC(x)dr(x). Moreover, let T"'*^ denote the so-called "punctured" rule, where Wk = 0. 

For x on some Lj, the integral over Lj in (51) is singular when p = 0. Discretizing it in 
the Nystrom fashion, we apply the punctured rule, 

u^(x„) ~ / 5(x^ - y)f (y)dr(y) 

p=.rO'™[5(x„-y)f(y)]. 

This has one major benefit and one drawback. The benefit is that it renders the discretization 
of (51) equivalent to the 2P Stokeslet ewald sum (36) - c.f. the exclusion of the term {n = 
m, p = 0} in the real space sum (17). The drawback is that the punctured rule is only first 
order accurate. 

To obtain higher accuracy, without sacrificing the structure that lets us apply the Ewald 
decomposition, one can add a local correction, TJJJ^,, to the punctured rule, 

u\^m) ^ TO'™[5(x„ - y)f (y)] + T^,. 

Such local corrections to simple quadrature rules (that allow integrable singularities to 
be handled) have been extensively investigated, and are, of course, unrelated to the planar 
periodicity of the present problem. Progress has been on a case-by-case basis, with different 
corrections developed depending on the singularity and the geometry of the boundary, T. 
Early references here include Lyness [47] and Kapur & Rokhlin [33], with subsequent work 
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by e.g. Aguilar & Chen [1] and Marin et al. [49]. The latter gives formal proofs to the 
effect, roughly, that on a uniform grid in d dimensions where the singularity has order q and 
p is the radius (in terms of grid points) of the modified quadrature rule, accuracy of order 
0{h'^P~^'^~^'^~^'^) is attained. Furthermore, Marin et al. in [48] developed such formulas for the 
Stokeslet, S, though only of the case of T being flat. 

An alternative that could be expected to yield local corrections, in a way suitable for pair- 
ing with fast Ewald methods, is the contour integral formulation of Bazhlekov, Anderson and 
Meijer [4] . Other alternatives for quadrature over integrable singularities includes singularity 
subtraction, as discussed in the rich survey by Pozrikidis [53], and methods based on carefully 
selected variable transformations, as discussed in e.g. Sidi [62, 63] and Ying et al. [71]. 

Moving on, for on Fj, we discretize (51) as 

u(x^)«-^ (T0'™[5(x™-y)f^(y)] + TE, + f; J2 Ty[S{^^ - y + r{p))fl (y)]] 

V i=ipez2\o J (52) 

= g^(Ew2p(x,f) + TI^,(S,f)). 

Here, x denotes the set of = NgNp discretization points (i.e. x = {x* : i = 1, . . . , Ns,j = 
1, . . . , Np}). Correspondingly, f denotes the set of discrete Stokeslet strengths multiplied by 
appropriate quadrature weights, f = {fjWj}. 

The mapping f — >■ u, for fixed x, is to be understood as a linear system of dimension 
3N X 3N; knowing u, we can solve for f . Such an inversion is by necessity iterative, as the 
mapping does not have a closed form. Indeed, the Ewald summation method is required 
to make the action of the periodized Stokeslet convergent (including constraints on f and 
physical conditions at 2: — )■ ±00). 

The fast Ewald method of Section 4 acts as an O(A^logA) "matvec" operator, for use 
inside an iterative solver (such as GMRES). The ultimate complexity then depends on the 
convergence of this iterative procedure (whether the number of GMRES iterations depends 
on N or not), which leads to the expansive topic of preconditioning; Ying et al. [71] use a 
method due to Greengard et al. [20] , wherein additional references are found. 

In this context it's natural to remark that a second-kind boundary integral formulation is 
expected to be more advantageous with respect to iteration convergence than the first-kind 
equation (51). This motivates the development of Ewald decompositions, and associated fast 
methods, for the double-layer ("doublet" or "stresslet") Stokes potential. 

Finally, it should be noted that investigations of physical systems generally pose the 
boundary integral problem in a different form. Rather than having boundary conditions for u 
on all particles, kinematic conditions (which include boundary integration) are used and the 
system is closed using force and torque balances. Such a formulation is used for sedimenting 
fibers [69, 58, 68]. Ewald techniques apply and can handle the main computational task in 
these sedimentation problems, though additional considerations emerge. 

5.2 Stokeslet Ewald as a matvec operator 

Having posed a relevant boundary integral problem, we now discuss how to deploy our method 
in detail and give an indication of how efficient it is. Several of the topics touched upon in the 
previous section, such as preconditioning and local quadrature correction terms, are beyond 
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Figure 6: Nearly uniform point distribution on sphere by subdividing icosahedron. Left to 
right: Np = 12,42,162. 

the scope of the present work. A central concern is clarifying how to balance the computational 
cost of the terms in (36). 

5.2.1 Parameters, accuracy and cost 

We consider a system with A'^s spheres, each with a radius r^, in a domain J7 = [0,L)^, and 
assume that the spheres' center are separated by at least Sr^. A nearly uniform distribution 
of points on each sphere, as seen in Figure 6, is obtained by subdividing an icosahedron [45]. 

We target an accuracy e ~ 10"^ in the Ewald method. The cost of computing the 2P 
Stokeslet Ewald sums (17) and (21) depends on the density of the system. Throughout, we 
shall take an initial number of spheres in a domain with L = and then grow the system 
at constant density to, say, L = 3L0. 

Assuming uniformity, the number of points within a ball with radius Tc is A„(rc) = 
4:irr^NsNp/(3L^). This represents the number of near neighbors that have to be considered 
for a particular real-space truncation radius, Vc- The cost of computing the real-space sum 
(cf. Section 4.1) is roughly NsNpNn{rc)X, where A represents the arithmetic cost associated 
with each interaction, plus the cost of finding the neighbor list. The factor A is one (i.e. 
one floating-point multiply-add) if the real-space sum is computed as a sparse matrix-vector 
multiplication and on the order of 100-1000 otherwise (because of the computations of erf(-) 
and exp(-)). 

Table 1 attempts to give an overview of the relationship between real-space sum cost, the 
system size, and what is then implied for the k-space sum by the inverted error estimates 
of Section 4.4. As expected, making the systems denser requires ^ to grow for the cost 
of the real-space sum (in terms of near neighbors) to remain fixed. Correspondingly, the 
FFT-grid M = 2koo grows to maintain a fixed accuracy. In what remains, we shall take 
P = 16, the FFTs oversampled by a factor four in the z-direction and Mq = 80 Gauss points 
in interpolation method for (35). With Tc = 1/2,.^ = 9,M = 30, and x the points from 5 
randomly placed spheres, we compute RMS errors: 2.2 x 10"^ and 6.2 x 10^^ for the real-space 
sum and SE2P method respectively. 

5.2.2 Fast implementations and run-time profiles 

In Figures 7 and 8 we give timing results as the system is scaled up and show how the run- 
time is distributed between the steps in the algorithm. Here we fix the average number of 
near neighbors at roughly 400 and 1000 respectively, and scale the system up under that 
constraint. The following remarks clarify the situation: 
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Given 


Derived 


Ns = 
Ns = 


100, Np = 
100, Np = 


-- 12, Nn = 
-- 12, Nn = 


= 100 
= 1000 


rc = 0.27,e = 
rc = 0.58, C = 


17.7, Ajqo 
8.3, koo = 


= 27 
= 12 


Ns = 
Ns = 


100, Np = 
100, Np = 


= 42,iV„ = 
= 42,iV„ = 


= 100 
= 1000 


rc = 0.18,e = 
rc = 0.38, C = 


26.8, A;oo 
12.5, 


= 41 
= 19 


Ns = 
Ns = 


1000, Np 
1000, Np 


= 12,iV„ 
= 12,iV„, 


= 100 
= 1000 


rc = 0.13,e = 
rc = 0.27,e = 


37.9, k(yQ 
17.7, k^Q 


= 58 
= 27 


Ns = 
Ns = 


1000, Np 
1000, Np 


= 42,iV„ 
= 42,iV„ 


= 100 
= 1000 


rc = 0.08, C = 
rc = 0.18,e = 


57.2, koo 
26.8, fcoo 


= 89 
= 41 



Table 1: Parameter examples for 2P Stokeslet Ewald summation, with e = 10^ and L = 1. 
Ng'. number of spheres. Np-. number of points per sphere. Nn'. number of near neighbors to 
account for in real-space sum. Right column: parameter values selected based on truncation 
error estimates (39) and (40). 

Remark 1 (Real space computation) The real space sum is evaluated in the matrix form 
as discussed in Section 4-1 ■ The matrix is computed once and the time to do so is amor- 
tized over a hypothetical number of iterations: 50 (which is reasonable, see [58]). This pre- 
computation contains two parts: first the matrix elements are computed (a), then the sparse 
matrix is finalized (b). The real-space contribution (17) is computed as a matrix-vector prod- 
uct (c). The bars in Figures 7 and 8 (left) are stacked from bottom to top in the order (a), 
(b), (c) - showing that, at 50 iterations, constructing the sparse matrix is still the dominating 
cost. 

Remark 2 (Computation of k-space method) The tasks in the FFT-based method to 
compute u^ are as follows: (a) compute grid function (44); (b) compute oversampled trans- 
forms; (c) solve Poission problem (45); and (d) compute convolution with Gaussians (42). 
The bar stacks in Figures 7 and 8 (right) breaks down the total time for the method in these 
steps from bottom to top. 

Our implementation was mostly in Matlab code, and, hence, FFTs were handled by the 
highly optimized library FFTW [16p. The code for gridding (44) and integration (42) with 
the FGG algorithm [21, 44] '^^^ implemented in G and the time- critical parts were hand-coded 
at machine level with SSE instructions and software prefetching. These implementations run 
close to peak flops of present hardware. An untuned implementation in G runs roughly at a 
third of the speed, making the gridding steps dominate transforms. 

Remark 3 (Computation of u^''^^'^) The computation of u^'k=o as discussed in Section 

4.3 involves computing nzerf^zS]) + ^e~^^^^ for z = Zm — Zn, where n = 1,...,N and 
m = 1, . . . ,Mq (the number of Gauss-points for the Ghebyshev interpolation). In the iterative 

^By convention, the fast Fourier transform is said to have complexity C'(A^ log A'^), but this concept encom- 
passes a lot of variability which is seen in practice. For instance, an FFT of length 2" will typically be several 
times faster than a transform of length 2" + 1. The grid sizes seen in the scaling tests here try to avoid the 
best and worst case transform lengths, to faithfully represent the method. 
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3 



N 



X 10 



,4 



N 



4 

X 10 



g = 9, r, = 1/2, e = 10~9, - 400 



L = 

Ns = 
N = 
M = 



1 1.2 1.4 1.6 1.8 2 2.2 

80 138 220 328 467 640 852 



2880 4968 7920 11808 16812 23040 30672 



30 36 42 48 54 60 66 



Figure 7: Runtime to compute (36) as system is scaled up, keeping the number of near 
neighbors fixed at around 400, according to the table above. Left: Real-space sum, cf. 
Remark 1. Right: FFT- based k-space method, cf. Remark 2. 

setting, where we wish to rapidly evaluate u-^'*'^'' for many f, this arithmetic should be pre- 
computed and stored in a matrix, Aq. Then the sum in (48) is computed by multiplying Aq 
with fj, i = 1,2. The interpolation procedure follows, which is very fast. Corresponding to the 
runs in Figures 7 and 8, the times to compute u^'^^*^ were vanishingly small. 

We note that balancing the computational cost of the real-space sum and k-space fast 
method is non-trivial. At a basic level, it hinges on how dense the system is. Evaluating the 
real-space sum using a neighbor list method (matrix-form or not) requires the list to be stored. 
In the iterative setting, the matrix form offers superior performance - so much so that the 
main constraint of the real-space method becomes memory (i.e. limiting Tc for a particular 
A^). In the fast k-space method the split in computational burden between transforms and 
gridding also depends on density - or rather, it depends on the Ewald parameter g, which has 
to increase to keep the number of near neighbors to account for in the real-space sum fixed if 
the system is made denser. 

6 Conclusions 

In this paper we have derived a Ewald decomposition for the Stokeslet in planar periodicity 
(Section 3.2) and a PME-type 0{N log N) method for the fast evaluation of the resulting 
expression. The decomposition is the natural 2P counterpart to the classical 3P decomposition 
by Hasimoto [24]. Truncation error estimates are provided to aid in selecting parameters 
(Section 3.3). The fast, PME-type, method (Section 4) for computing the k-space sum (21) 
is based on a mixed sum/integral form (19), and is similar to the method by the present 
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Figure 8: Runtime to compute (36) as system is scaled up, keeping the number of near 
neighbors fixed at around 1000, according to the table above. Left: Real-space sum, cf. 
Remark 1. Right: FFT- based k-space method, cf. Remark 2. 



authors [43] for the corresponding problem in electrostatics. This appear to be the first fast 
method for computing Stokeslet Ewald sums in planar periodicity, and has three attractive 
properties: it is spectrally accurate; it uses the minimal amount of memory that a gridded 
Ewald method can use; and a clear view of numerical errors and how to choose parameters 
is provided (Section 4.4). The final part explores the practicalities of the proposed method, 
and surveys the computational issues involved in applying it to 2-periodic boundary integral 
Stokes problems. We presently pursue applications-oriented questions in this regard, and 
hope to communicate further results in the near future. 



A 2P Stokeslet sums, details 

A.l Explicit k-space form of Stokeslet sum 

Here we compute 




where k = (k, k) = (A;i,/c2,k) is the composition of the discrete and continuous transform 
variables. The integrals in (53) are computable. We let 

4 ^ - ik - 

u(Xm) = — 2 ^ ^ ^ ^ Q(k, Zm Zn)fnG ^"^ , 
k^O n=l 
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with 



Q{k,z) :-- 



k (gik 



,k||2 + K2 (||k||2 + K2)2 

: Q1+Q2, 



where Qi and Q2 denote integrals over the diagonal- and outer product terms respectively. 
Note that, since k 7^ 0, A;2 > and the integrals are nonsingular. We shall encounter integrals 
of the form 



LPJk,z) :-- 



00 f^p 



kP cos{kz) 



lo (A:2 + K^)i 
for which we have, with X = L, M, that 



dK and MP{k, z) := 



00 i,.Pci- 



sm[KZ ) 





d/i, 



1 dxp 

^ q£Z,q>0. 



2kq dk 
Evidently, 

Qi(k,z) = 2IL?(||k||,z). 
The integral here is known [74, 3.723(2), p. 424], 

-k\z\ 



(54) 



(55) 



L^iKz) 



vre 



2k 



k>0. 



To simplify matters in computing Q2 ~ reducing the number of integrals to labor over - we 
note that the integrand can be viewed as an element-wise multiplication of 



(k,l)®(k,l) 



kf kik2 ki 
kik2 k2 k2 
ki k2 1 



and 



(l,l,K)0(l,l,/t) 

(||k||2 + K2)2 



2\-2 



1 1 K 

1 1 K 

2 

K/ K/ Ki 



where the former is a constant under the integration. The latter demonstrates that there are 
three integrals to compute and arrange as: 



Q2(k,z) = -2 
Having already found L^, (54) gives that 



A:fL^(||k||,z) A:iA;2L^(||k||,z) ikiM^{\\k\\, z) 
feiA;2L0(||k||,z) fc2L0(||k||,z) ik2M^i\\k\l z) 
iA;iMi(||k||,z) ik2M^{\\klz) Ll{\\k\\,z) 



Ll{Kz) 



m-^\^\{k\z\ + 1) 
4P ' 
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Moreover, L2 = Lq — k'^L2, so that 

Ll{k,z) 

by (54). Finally, 



7re^^i^l(A:|z| - 1) 
4k ' 



d 



4{k,z) 



TTZe 



-k\z\ 



dz 4k 

Thus, we have obtained an explicit form of the k-space 2P Stokeslet sum (53): 



where 



TV 



k^On=l 



Q(k,z) 



L? - klLl -kik2Ll -ikiMl 
-kik2L^ L\ - klL^ -ik2Ml 
-ikiMl -ik2Ml L? - L| 



IklLz) 



-IlkllN 



■ HMM±![M 7r(||k|||z|+l)A;ifc2 
2||k||2 2||k||2 
7r(||k|||2| + l)fcifc2 _ (7r||k|||2|+7r)fc2 

2pF ^ 2||k||2 2' 

— kiirzki —hiTTzk2 i(7r||k|||z| + vr) 



-^iiTzki 
\i7rzk2 



A. 2 Explicit k-space form of regularized Stokeslet sum 

Our objective here is to compute the integral Q, given in (20), to obtain an explicit 2P 
Stokeslet Ewald sum. The tensor B (7) contains a diagonal and an outer product term, and 
we integrate them separately. That is, let 



where 



Qi(k,z)=Ie-ll''l'^/^«^ 
= Ie-||k||V4?2 



g(k,z) = Qi(k,z) + Q'^®'^(k,z), 



1 1 \ _^2/Ac2 



1 



+ 



1 



4^2 ||k||2 + K2 



and 



Q'^^*(k, z) 



||k||2/4?2 



1 



4e2||k||2 
1 



+ 



+ 



k)e-"'/^«'e^''MK 
1 



.4e2(||k||2 + K2) (||k||2 + ^2)2 

For non-negative integers p and q we let 
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and 



KP{k,z) :-- 



With these definitions, it is evident that 



Q\k,z)=2(^^ + 4i\\k\\,z) 



The first integral is elementary, 

J^{z) = ^Ce-''^\ 
and the second can be found [74, 3.954(2), p. 504] 

.2^ 



Ji{k,z) 



vre 



4k 



e ''^erfc 



2,^ ) + e'^'^erfc 



2? 



k>0. 



(56) 



(57) 



Moving on to 
find that 



one studies the symmetries of the integrand as when computing Q2, to 



Q*®*(k,z) 



^1 + J2 



kik2 + JP 



ki[^ + iK^ 

ciA^2 (It + ^2°) kl + jo) k2 + ii^l 



4|7 



+ 4 



IklLz). 



The integrals present are related, as previously, via (54) and obvious algebraic relationships. 
With Jg and Ji known, one finds 



2 7O/ 



Jf{k, z) = J^{k, z) - k\^{k, z) 



nn. 1 ^ 72 



J^{k,z) 



Jt{k,z). 



2k dk 

Moving on to the anti-symmetric integrals K^, one can find [74, 3. 954(1), p. 504] 



Kl{k,z) 
and, consequently. 



vre 



-'^^erfc 



k 
2^ 



z^ ] - e'^'^erfc { + z^ 



2^ 



This completes the delicate matter of integration. There remains to find compact expressions 
for the JP- and terms that were left as derivatives above. The forms of (56) and (57) 
suggest that we look for terms 

k 



-'^^erfc 



2e 



k 
2? 



^z 



With this, explicit forms of and KJ^, as given in (27)-(33) follow. 
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B Conventions for Fourier transforms and series 



For compactness, we use a non-unitary form of the Fourier transform. For a function, /(x), 
X G M", which is periodic with respect to f2 C M", we have the Fourier series defined as 

/(x) = ^/ke^'^- 

k 

/k=^/^/(x)e-'^"^dx, 

where k € {27rn/L : n G Z"}. The corresponding integral transform for functions that decay 
sufficiently fast on M" is then naturally 

/>) = / /(x)e-^^-^dx, R G M". 
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